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Abstract 

A random network model which allows for tunable, quite general forms of clus- 
tering, degree correlation and degree distribution is defined. The model is an ex- 
tension of the configuration model, in which stubs (half-edges) are paired to form a 
network. Clustering is obtained by forming small completely connected subgroups, 
and positive (negative) degree correlation is obtained by connecting a fraction of 
the stubs with stubs of similar (dissimilar) degree. An SIR (Susceptible — > Infective 
—¥ Recovered) epidemic model is defined on this network. Asymptotic properties of 
both the network and the epidemic, as the population size tends to infinity, are de- 
rived: the degree distribution, degree correlation and clustering coefficient, as well 
as a reproduction number R*, the probability of a major outbreak and the relative 
size of such an outbreak. The theory is illustrated by Monte Carlo simulations and 
numerical examples. The main findings are that clustering tends to decrease the 
spread of disease, the effect of degree correlation is appreciably greater when the 
disease is close to threshold than when it is well above threshold and disease spread 
broadly increases with degree correlation p when is just above its threshold value 
of one and decreases with p when R* is well above one. 

Keywords: Branching process, configuration model, epidemic size, random graph, SIR 
epidemic, threshold behaviour. 
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1 Introduction 

Ever since the pioneering work of Erdos and Renyi (1959) on a simple random graph there 
have been numerous important contributions on random graph models with the aim of 
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making them more flexible and realistic. For example, the configuration model (Molloy 
and Reed (1995) and Newman et al. (2001)) defines a network allowing for more or less 
arbitrary degree distribution Fd, the distribution describing the number of neighbours 
D of a randomly selected node (which in the epidemic context represents an individual) 
in the network. (For simplicity, from now on we refer to D as the degree distribution.) 
This extension was important for two reasons: most empirical networks tend to have 
much heavier tailed degree distributions than the Poisson distribution of the Erdos-Renyi 
(E-R) network, and networks with heavy tail degree distributions have been shown to 
exhibit rather different properties when compared with the E-R network; for example, if 
an epidemic outbreak takes place on the network the epidemic threshold Ro is much higher 
(or even infinite) as compared to the same epidemic taking place on an E-R network with 
the same mean degree (Andersson (1999)). 

Two other properties of real world networks that are not present in E-R networks are 
clustering and degree correlation. The clustering coefficient c measures how likely it is 
that two neighbours of a randomly selected node are neighbours themselves. The E-R 
network has no clustering whereas nearly all empirical networks have positive clustering, 
with typical values in the range 0.1-0.5 out of the possible range 0-1 (see Newman (2003), 
Table 3.1). The degree correlation p instead measures the correlation between the degrees 
of the adjacent individuals of a randomly selected edge. The E-R network has p = 
whereas 'random' networks with heavy tail degree distribution tend to have p > (van 
der Hofstad and Litvak (2012)). Empirical networks, on the other hand, have both positive 
and negative degree correlation: there seems to be a tendency for computer networks to 
have p < whereas social networks (our main interest) typically have p > (see Newman 
(2003), Table 3.1). There are numerous network models studied in the literature, with 
the aim of allowing one or several of these three extensions (of local properties) from 
the original E-R-network (see some references below where the focus is also on epidemics 
evolving on the network); the term 'local' refers to the fact that it is sufficient to observe 
nodes and their neighbourhoods to determine/estimate such properties (the complete 
network need not be observed in order to evaluate them). The current paper defines a 
model in which D, c and p can be made more or less arbitrary. 

There are of course other important extensions in addition to allowing for arbitrary degree 
distribution, degree correlation and clustering. Further local properties considered in 
many models for social networks are households and other fully connected smaller units 
(e.g. Ball et al. (1997)), and models in which nodes and/or edges are of different types 
(e.g. Britton et al. (2007), Ball and Sirl (2012)). Several models have also been proposed 
which combine household and network structure, for example Trapman (2007), Gleeson 
(2009), Ball et al. (2010) and Ma et al. (2012). Other models aim to study and extend 
the range of global properties, such as small world networks (Watts and Strogatz (1998)) 
and dynamic network models (Barabasi and Albert, (1999)). This paper does not address 
these (or any other) extensions; the focus being on degree distribution, degree correlation 
and clustering. 

Our main motivation for studying networks is to investigate social networks and to ex- 
amine what effect the three above-mentioned properties have in the event of an infectious 
disease entering the community; both in terms of the possibility and probability of an 
epidemic outbreak taking off, and also how large such an outbreak will be if it does take 
off. We study the class of SIR epidemics (e.g. Andersson and Britton (2000)) in which 
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individuals are at first Susceptible (except for some introductory infectious cases) and 
those who get infected become Infectious for a random period of time when they may 
infect their network neighbours, after which they Recover and become immune to further 
infection. See, for instance, Diekmann et al. (1998), Andersson (1999) and Diekmann and 
Heesterbeek (2000, Ch. 10) for early analytical contributions in this area. 

As mentioned above there have been many contributions to this area of research, in 
particular over the last decade or two. Allowing for arbitrary degree distribution, and 
studying its effect on an epidemic, dates back longer. May and Anderson (1987) concluded 
(when modelling the spread of HIV) that a heavy tail degree distribution makes the 
reproduction number Rq large or even infinite. The important insight from their analysis 
was that diseases with very low transmission probability still may be at risk of epidemics 
taking off in networks having small mean degree, if the variance of the degree distribution 
is very large. The effect of clustering on epidemics has been studied in, for example, 
Britton et al. (2008), Miller (2009) and Newman (2009). Degree correlation has often 
been analysed in combination with clustered networks (e.g. Gleeson et al. (2010)). The 
impact of clustering and degree correlation on epidemics on networks has been studied 
empirically using simulation by Badham and Stocker (2010) and Isham et al. (2011). The 
main focus of most papers concerning epidemics on networks with controllable clustering, 
degree correlation and/or degree distribution lies in studying how these features affect 
the basic reproduction number Rq, i.e. the possibility of having an major outbreak. To 
derive the probability of such an outbreak, and its likely size in the event that it takes off, 
requires significantly deeper analysis; which for several of the above-mentioned models 
still is missing. 

The current paper introduces a network model which (i) allows for more or less arbitrary 
clustering, degree correlation and degree distribution, and (ii) permits theoretical analysis 
of epidemics defined on the network. As in the configuration model, the network is formed 
by attaching stubs (i.e. half-edges) to individuals, which are then paired to form the 
edges of the network. The degree of an individual is the number of stubs emanating 
from it. The desired clustering and degree distribution is obtained by having two types 
of stubs going out from individuals. A fraction of stubs is local (which fraction being 
closely related to the desired clustering) ; the remaining stubs are global and are connected 
randomly (as described below) among stubs from all individuals. The local stubs are 
connected by grouping individuals into small local groups ('households'). For example, 
an individual with four local stubs is connected to four other individuals having local 
degree 4, thus forming a group of 5 completely connected individuals (contributing to 
increased clustering). The degree distribution is given by the distribution of the sum of 
the local and global degree of a typical individual. Finally, the desired degree correlation 
p is obtained by manipulating how the global stubs are connected, which is controlled 
by a parameter r satisfying — 1 < r < 1. With probability 1 — \r\ a stub is connected 
uniformly at random among all global stubs. With probability \r\ the stub is connected 
to a stub having very similar total degree (if r > 0) or 'opposite' total degree (if r < 0). 

The remainder of the paper is organised as follows. A more rigorous definition of the 
model appears in Section [2J where a continuous-time SIR epidemic on the network is 
also defined. In Section [3j we derive expressions for the degree distribution D, clustering 
coefficient c and degree correlation p, as functions of the model parameters, and discuss 
the more relevant reverse problem of choosing model parameters to obtain a desired c, p 
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and D, using a Poisson total degree distribution as a template. We also describe a simple 
rewiring algorithm, motivated by Miller (2009) and Gleeson et al. (2010), which permits 
the clustering in a network to be reduced in a controlled fashion without changing p or D. 
In Section HJ we analyse the main characteristics of epidemics defined on the network for 
suitably large population sizes, by exploiting approximating branching processes. Specif- 
ically, in Section I4.1[ we obtain a threshold parameter which determines whether or 
not a major outbreak is possible, and derive the probability that a major outbreak oc- 
curs (assuming that the infectious period is constant) and, in Section I4.2[ we derive the 
relative final size (i.e. the proportion of the population that is ultimately removed) of a 
major outbreak. In Section [5j we describe how these results on epidemics are modified to 
incorporate rewiring and prove that, if all other parameters are held fixed, such rewiring 
increases the threshold parameter and both the probability and relative final size of 
a major outbreak. In Section [6] we illustrate the theory with some numerical examples 
which demonstrate that the effect of degree correlation on epidemic properties is apprecia- 
bly greater when the disease is just above threshold than when it is well above threshold. 
Moreover, both the probability and size of a major outbreak broadly increase with p when 
the disease is just above threshold, while they broadly decrease with p when the disease 
is well above threshold. However, this behaviour is not monotonic, particularly when 
clustering is low and R* is close to one. We conclude with a brief discussion in Section 

2 The network model and the epidemic 
2.1 The network model 

Consider a network of undirected edges with n nodes (individuals). Below we define how 
to construct the network. First we define a set of random variables and briefly explain 
their interpretation in the network. 

Let G be a discrete non-negative random variable with distribution {pk} referred to as 
the 'global degree', let H be another strictly positive discrete random variable with dis- 
tribution {iTh}- In some cases H will reflect the household distribution in the community, 
but in applications where the underlying network has no household structure H is simply 
a device to introduce clustering into the network. Finally, let r be a real number satis- 
fying — 1 < r < 1. The value of |r| reflects how often outgoing global edges connect to 
nodes of similar (if r > 0) or 'opposite' (if r < 0) 'total degree'. Let X be a Bernoulli 
random variable with parameter \r\, so P(X = 1) = \r\ = 1 — P(X = 0), this variable 
will determine if a stub will connect to a random stub or a stub with similar/'opposite' 
degree. 

The network is constructed as follows. Let H^H^,--- be independent and identically 
distributed copies of the random variable H. Label the n nodes 1, 2, • • • , n and group the 
first Hi nodes into local group (household) one, nodes H\ + 1, H\ + 2, • • • ,Hi + H 2 into 
group 2 and so on until all individuals belong to a local group (the last group will have 
a 'truncated' size). All nodes of a local group are connected to each other (for example, 
the first Hi nodes make up a fully connected component with all individuals having local 
degree Hi — 1). Let Gi, G 2 , • - • , G n be independent and identically distributed copies G; 
Gi denotes the global degree of node i. The total degree of individual i, D iy equals the 
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global degree plus the local degree, the local degree being one less than the group size. 
For example, a node residing in a local triangle (H = 3) and having global degree k has 
total degree k + 2, whereas a local singleton (H = 1) with global degree j has total degree 
j. A node having global degree k has k outgoing stubs, and each of these stubs is labelled 
with an independent copy of X (stubs having independent and identically distributed 
X- variables with P(X = 1) = |r| = 1 — P(X = 0)) and the total degree of the node from 
which it emanates. All outgoing stubs in the network with label X = are connected 
pairwise completely at random. The remaining stubs (having X = 1) are also connected 
randomly but in a different manner. This is done by ordering all global stubs having label 
X = 1 (suppose that there are rii such stubs) according to their total degree, and then 
separating the empirical distribution of global degrees so generated into rig (a fixed and 
freely chosen positive integer) equally sized quantiles. (If rii/riQ is not an integer then 
the tiq quantiles are made as equal in size as possible.) The first such quantile hence 
consists of the rii/riQ stubs having smallest label (i.e. total degree) and so on. If r > 0, 
each quantile is treated in turn and all the stubs in that quantile are paired uniformly 
at random. If r < 0, the stubs in the first quantile are paired uniformly at random 
with those in the ngth quantile, the stubs in the second quantile are paired uniformly at 
random with those in the Uq — 1th quantile, and so on. Thus, if uq is odd, the stubs 
in the middle quantile are paired uniformly at random with each other. The effect of 
this pairwise connection is that nodes of similar total degree will be connected if r > 0, 
whereas nodes of rather different total degree will be connected if r < 0; in both cases 
leading to correlated degrees (but of different sign). There may be one unattached stub 
having label X = and at most hq unattached stubs having label X = 1 following the 
above pairings. These are simply ignored. This has no effect on the asymptotic properties 
of the network, nor on epidemics defined thereon, as n — > oo. In the above construction, 
all the H, G and X random variables are assumed to be independent. 

The network is hence made up of local completely connected groups having groups size 
distribution {ith} (as n goes to infinity the effect of the last group having a truncated 
household size is negligible). On top of this, each individual has global edges, the number 
being distributed as G. Some of these will be formed by connecting to other random stubs, 
the others will be formed by connecting to other stubs having similar or 'opposite' degree, 
thus creating positive or negative degree correlation. The construction of global edges may 
result in the presence of multiple edges and self-loops. However, if the degree distribution 
D has finite variance, the fraction of these will be negligible as n — > oo, so removing them 
has negligible effect on the degree distribution and how stubs are connected (cf. Durrett 
(2006, Theorem 3.1.2) and Janson (2009)). The special case where r = or uq = 1 is the 
network and households model (without degree correlation beyond that induced by the 
presence of households) studied by Ball et al. (2010), since in either of these situations all 
global stubs are simply paired uniformly at random. 

2.2 An epidemic model on the network 

We now define a continuous-time epidemic model for the spread of an SIR-type infectious 
disease upon the network defined in Section 12.11 We suppose that there is one initial 
infective, chosen uniformly at random from the n individuals (nodes) in the population 
and that the remainder of the population is susceptible. The infectious periods of different 
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infectives are each distributed according to a random variable /, having an arbitrary but 
specified distribution. Throughout its infectious period, a given infective makes infectious 
contacts with any given neighbour (either local or global) in the network at the points of 
a homogeneous Poisson process having rate A. A susceptible becomes infective as soon 
as it is contacted by an infective and an infective becomes removed (and plays no further 
part in the epidemic) at the end of its infectious period. Contacts between an infective 
and an infective or removed individual have no effect. All Poisson processes describing 
infectious contacts (whether or not either or both individuals involved are the same) and 
all infectious periods are mutually independent; they are also independent of the random 
variables used to construct the network. The epidemic ends when there is no infective 
remaining in the population. 

3 Properties of the network model 

We now derive the total degree distribution D, the clustering coefficient c and the degree 
correlation p for the network defined in Section 12.11 We treat the asymptotic case where 
the number of nodes n tends to infinity. 

3.1 The degree distribution 

We start with the degree distribution. From the construction it follows immediately that 
a node has global degree G. The local degree is one less than the household size, and the 
household size of a randomly selected node has distribution {vr^}, where = hiTh/fiH 
and hh = J2j3 n j> i- e - the size-biased local group-size distribution. Let H denote a 
random variable having the size-biased household distribution. It then follows that the 
total degree distribution (in the network) is given by 

D = G + H-1, (1) 

where = means equal in distribution and G and H are independent. In particular it 
follows that the mean total degree is 

= H h/iff - 1. 

(Throughout the paper, for a random variable, A say, nx and o\ denote respectively the 
mean and variance of A.) 

3.2 The clustering coefficient 

There are several measures of clustering used in the literature. We use a 'probabilistic' one 
(see, for example, Trapman (2007)) where an ordered triplet of nodes (i,j,k) is selected 
completely at random among all such ordered triplets for which i is directly connected 
to j and j is directly connected to k. The clustering coefficient c is then defined as the 
probability that i and k are also directly connected (i.e. that i, j and k form a triangle). 
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Thus c is given by the fraction of ordered triplets in the network that are triangles. The 
clustering coefficient of the present network model is identical to that of the model in Ball 
et al. (2010), since the models differ only in the way that global stubs are paired. For 
large n, the proportion of ordered triangles that are not wholly within households is small 
and zero in the limit as n — > oo. Thus, asymptotically, the global pairings do not yield 
triangles in either of the two models, explaining why the clustering coefficients are the 
same for the two models. Hence, from equation (14) of Ball et al. (2010), the clustering 
coefficient c = c(G, H, r) is given by 

E[H(H-l)(H-2)] 
E[(H(G + H-1)(G + H- 2)]' [) 

where G and H are the household and global degree distributions of the network. 



3.3 The degree correlation 

We now formulate an expression for the degree correlation p of the current network model. 
One way to define p is to pick a random edge in the network and let p be the correlation 
between the total degrees of the nodes adjacent to this edge (Newman, 2002a). The 
derivation of p involves long but standard computations which are given in the appendix. 
A key step in the derivation is to first condition on whether the chosen edge is a global 
or a local edge, the former having probability pc given by 

PG = : r- 3 

If the edge is global the degree covariance (of the right and left node adjacent to the edge) 
comes from the two stubs having the same (or 'opposite') quantile(s), which happens with 
probability |r|, and if the edge is local the degree covariance stems from the nodes having 
the same local degree. 

Before giving the expression for the degree correlation p = p(G, H, r) some more notation 
is required. Let H denote a random variable giving the household size of a household 
edge chosen uniformly at random from all household edges. Since a household of size h 
contains (2) edges, P(H = h) oc (2)^ {h — 2, 3, • • • ), so 

^--^--w^h <*".»■•■•>■ 

Let D and Q denote respectively the total degree and quantile of a stub chosen uniformly 
at random from all stubs in the limit as n — > 00. Then D = G + H — 1, where G 
and H are independent, and G denotes a random variable having the size-biased global 
degree distribution {p g }, where p g = gp g /pa (9 — 1)2, •••)• F° r * = 1>2, ••• ,uq and 
d = 1,2, let PQ\f){i\d) = P(Q = i\D = d) and p^Q(d\i) = P(D = d\Q = i). 
(These conditional probabilities are derived easily from the probability mass function 
of D, noting that if u = and u d = P(D < d) (d = 1, 2, • • • ) then P(D = d,Q = 

i) = max |min(M d , ^) - max(« d „ 1 , j (d = 1, 2, • • • ; i = 1, 2, • • • , n Q ).) Define the 
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function <7£ jTlQ (r)by 




if r > 0, 
) if r < 0, 



(4) 



where 
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^2dp 3l Q{d\i) (i = 1,2,- ■■ ,n Q ). 



(5) 



d=l 



It is shown in the appendix that 
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(1 - p G )a\ + Pc^D.ng ( r ) + Pg(1 - Pg) (^h - - S) 



(6) 



(1 - p G ) (a\ + <r G ) + p G (<4 + a|) + p G (l - p G ) - n k - & 
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3.4 Rewiring 

Note that for household size and global degree distributions H and G, the degree distri- 
bution D and the clustering coefficient c are both independent of the parameter r. Thus, 
by letting r vary between —1 and +1 and keeping the distributions of H and G fixed, it is 
straightforward to tune the degree correlation in our network model without changing the 
degree distribution or clustering coefficient of the network. However, if we keep r fixed 
and vary, for example, the household size distribution to change the clustering coefficient 
of the network, then its degree distribution D and degree correlation p change also. This 
observation means that it is more difficult to tune just the clustering coefficient in a net- 
work. One way around this problem is to extend the rewiring construction of Gleeson et 
al. (2010) (see also Miller (2009), where the idea first originated) to our model. 

Suppose that we construct a realisation of our network model and then colour all global 
edges green and all household edges red. Household edges are also labelled according 
to their household size. Let prw be a real number satisfying < p^w < 1- Then, 
independently for each household, with probability p RW the red edges in a household are 
each broken into two stubs, which retain their colour and household-size labels. For each 
h = 2, 3, • • • , the red stubs with label h are now joined uniformly at random, which, 
together with the green edges and unbroken red edges creates a new network. 

Observe that the above rewiring does not alter the degree distribution or the corre- 
lation structure (and in particular the degree correlation) of the network but it does 
change its clustering coefficient. Let c(G, H,r,p RW ) denote the clustering coefficient for 
the model with rewiring probability Prw, so c(G, H, r, 0) is the clustering coefficient of 
our model without rewiring. In the limit as n — > oo, the proportion of triangles that 
are not wholly within unbroken households tends to zero, whence c(G, H,r,pRw) = 
(1 — Prw)c(G, H, r, 0). Thus, given our network model without rewiring, it is straightfor- 
ward to use the above rewiring to tune the clustering coefficient to be any value between 
and that of the model without rewiring. 
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3.5 Tuning 

The formulae given in Sections 13.21 and 13 . 3 1 are fairly long but simplify appreciably for the 
special situation where both the household sizes and the global degrees follow Poisson- 
based distributions. Specifically suppose that, with < p < 7, G follows a Poisson 
distribution with mean 7-/1, which we denote by Poi(7 — p), and H follows a Poisson 
distribution with mean p that is conditioned on being strictly positive, which we denote by 
Poi + (/i). Here we interpret Poi + (0) to be lim^o Poi + (p), the distribution identically equal 
to 1. Thus 7r h = (1 - e-^^e-^/hl (h = 1, 2, • • • ). Then H - 1 ~ Poi(/x) and it follows 
from flTJ that the total degree D ~ Poi(7). Further, 1 — pa — p/7 and H — 2 ~ Poi(p), 
so using and ([H]), the formulae for the clustering and degree correlation are given by: 



where g 1>nQ {. r ) is given by (jl]) with D ~ 1 + Poi(7). 

Observe that <7 7inQ (0) = 0, so c = p when r = 0, i.e. for the model studied in Ball et 
al. (2010), Sections 4.3 and 4.4. Suppose that 7 and p are held fixed, so the clustering 
coefficient c is also held fixed. Then as r varies from —1 to +1 the degree correlation 
p varies between the values obtained by setting r = — 1 and r = 1 in the formula for p 
in (|7j). These lower and upper values for p are shown in Figure [TJ as functions of c for 
different choices of the number of quantiles uq, for the case when 7 = 10. In the limit 
as uq — > 00, if r > then a stub with label X = 1 is paired, almost surely, with a 
stub having the same total degree and g 7nQ (l) — > var(l)) = 7 (recall D ~ 1 + Poi(7)). 
It follows that the corresponding upper value for p is 1 + c — y'c. In the same limiting 
situation, if r < then a stub with label X = 1 is paired, almost surely, with a stub 
having the 'opposite' total degree. There is no simple expression for hm ng _ >00 ^ 7jnQ (— 1), 
though it is easily computed. Observe from Figure [1] that very little extra is gained, in 
terms of the range of possible (c, p), by choosing a large value of uq. In practice, a small 
value of hq is beneficial as the proportions of self-loops and parallel edges between nodes, 
resulting from the pairing of stubs, both increase with tiq. Additionally, large values of 
Uq mean that the approximating branching processes have many types and numerical 
calculation of quantities of interest becomes more computationally intensive. 

Write c = 0(7, p, r) and p = p(j, p, r) to show explicitly their dependence on the param- 
eters and, for 7 > 0, let A 7 = {(0(7, p, r), p(y, p, r)) : < p < 7, — 1 < r < 1} be the set 
of possible values (c, p) in our model when the total degree is Poi(7). For any (c, p) G A 7 
there is a unique (p,r) such that (0(7, p,r), p{y, p,r)) = (c, p), so the model without 
rewiring can be tuned uniquely to any attainable (c, p). If we allow rewiring, it is easily 
seen that by choosing the rewiring probability pnw appropriately, for each (c, p) lying 
strictly above the lower boundary of A 1 , there is a continuum of models with clustering 
coefficient c and degree correlation p. 

A similar analysis to the above holds for other choices of total degree distribution D, 
though note that not all distributions D can be decomposed as in ([1]) in such a way 
that the clustering may be tuned continuously. Distributions D for which this is possible 
include negative binomial and compound Poisson. Indeed any distribution D that is 
infinitely divisible may be decomposed so that the clustering coefficient is any rational 
number in [0, 1). 




and 
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Clustering coefficient c 

Figure 1: Plot showing bounds on possible values of (c, p) when D ~ Poi(10). 

4 Epidemics on network without rewiring 

4.1 Establishment of the epidemic 

4.1.1 Approximating forward branching process 

The initial infective triggers a local (i.e within-household) epidemic in its household. 
Each infective in that local epidemic (including the initial infective) may make (global) 
infectious contact with individuals in other households. If the population size n is large, 
the probability that such global infectious contacts are all with individuals in previously 
uninfected households is close to one, owing to the random way in which the underlying 
network is formed. It follows that in the early stages of an epidemic the process of 
infected households may be approximated by a branching process, with individuals in the 
branching process corresponding to infectious households in the epidemic process. Unless 
r = or nq — 1, this branching process needs to be multitype, since the degrees of 
endpoints of a global edge with X = 1 are correlated. Except for the ancestor, the type of 
an individual in the branching process is obtained by considering the primary infective, i* 
say, in the corresponding single-household epidemic. The type of the individual is given 
by the total-degree quantile of the stub used in constructing the global edge along which 
i* was infected in the epidemic. Thus there are hq types of individual in the branching 
process. The ancestor of the branching process is not typed in this fashion since the 
initial infective in the epidemic is chosen uniformly at random from the population and 
not infected along a global edge in the network. Nevertheless, the offspring distribution 
of the ancestor in the branching process depends on the household size and global degree 
of the initial infective in the epidemic. 

Following Ball et al. (2009), the above branching process is termed a forward branching 
process as it approximates the forward spread of an epidemic process. In Section 14.21 we 



10 



consider a backward branching process, which approximates an inverse epidemic process. 

The approximation of the early stages of the epidemic process by the forward branching 
process can be made precise by constructing the branching process and, for each n = 
1, 2, • • • , a realisation of the epidemic process on a common probability space and using 
a coupling argument to show that, as n — > oo, the process of infected households in the 
epidemic process converges almost surely to the multitype branching process; cf. Ball and 
Sirl (2012). Thus, if the population size n is sufficiently large, the probability that the 
epidemic becomes established and leads to a major outbreak is given approximately by 
the probability that the branching process survives (i.e. does not go extinct). Moreover, 
whether or not a major outbreak can occur with non-zero probability is determined by 
whether or not the branching process is supercritical. 

We now determine the means and probability generating functions (PGFs) of the offspring 
distributions of the branching process, which determine respectively whether a major 
outbreak can occur and, if so, its probability. The offspring distribution is different in 
the initial generation from that of all subsequent generations, since the initial infective 
is chosen uniformly at random from the population (so its local and global degrees are 
independent), while subsequent primary infectives are infected through the network and 
their local and global degrees are dependent. We focus first on the offspring means for 
a non-initial generation, since they determine whether or not the branching process is 
supercritical. 

4.1.2 Offspring mean matrix and threshold parameter 

Let Bp denote the above multitype forward branching process and let Bp be the multitype 
branching process describing the descendants of a typical first-generation individual in 
Bp. Thus the type-dependent offspring law is the same for all generations in Bp. For 
% = 1,2, ••• , tiq, let Ci = (Cn,Ci2, • • • ,Cin Q ) De a vector random variable describing 
the numbers of offspring of different types of a typical type-i individual in the branching 
process Bp. Thus, CV, is the number of type-j primary infectives generated by a typical 
single-household epidemic, whose primary infective is of type %. Let M = [my] be the 
tiq x tiq matrix with elements = E[Cy] and let R* be the dominant eigenvalue of 
M. Then by standard multitype branching process theory (see Mode (1971), Chapter 
1, Theorem 7.1), the branching process Bp survives with strictly positive probability if 
and only if R* > 1. Thus R* serves as a threshold parameter for our epidemic model. 
Note that this and subsequent results using the theory of multitype branching processes 
require assumptions regarding the irreducibility and/or positive regularity of the mean 
matrix M, which are met for all but highly pathological choices of G, H and tiq. 

In order to compute M, and hence R*, we need a further probability distribution. For 
d — 1, 2, • • • and h — 1, 2, • • • , d, let tt^ be the probability that a stub chosen uniformly 
at random from all stubs having total degree d belongs to an individual who resides in 
a household of size h. Note that this probability is the same for stubs with label X = 
and stubs with label X — 1, and that 

_ ^hhpd-h+i _ ^hpd-h+i 

L,fe'=l Kh'h'Pd-h'+l 2^h'=l n h'Pd-h'+l 
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To obtain my, we condition first on the total degree of a typical type-z primary infective 
and then on the size of its household yielding 



rriij 



EPfiloWE^ElCg^], (8) 



d=l h=l 



where c[ h ' d = (C^' d \ Cjo'^ , • • • , C^'n) is defined analogously to Cj, except we condition 
on the type-i individual residing in a household of size h and having total degree d. (Note 
also that P£>\Q(d\i) is is independent of the X- label of the individual concerned.) 

Consider a typical size-^i single-household epidemic, with one initial infective, who is of 
type i and has total degree d, and label the household members 0, 1, ■ • • ,h — 1, where 
is the initial infective. For k = 1, 2, • ■ ■ , h — 1, let Xk — 1 if individual I is infected by the 
single-household epidemic and let Xk = otherwise. Then 

Cf' d) = cf' d) (0) + EX^ (M (^), (9) 

k=l 

where, for k = 0, 1, • • - , cf = (<5%> d \k), C^ d) (k), ■ ■ • , <5£f (*)), with C[f d \k) 
being the number of type-j primary infectives generated by individual k in the single- 
household epidemic if it becomes infected. (Throughout the paper, sums are zero if 
vacuous.) 

Let T<® = YXX Xk be the final size of the above single- household epidemic, not including 
the initial case, and let ^ h) {\) = E[T^]. Then, see Ball (1986) equations (2.25) and 
(2.26), 

^(A) = h _ i _ ( H ~ a k Mk\) h ~ k (h = l,2,-.-), 

k=0 ^ ' 

where <fii{9) = E[exp(— 91)] {9 > 0) is the moment generating function of / and a^, a%, ■ ■ ■ 
are defined recursively by 



Note that Xk and C^' d (k) are independent, because whether or not an individual is 
infected by the single-household epidemic is independent of its infectious period, so taking 

expectations of (Q and noting that c\ ' \l),C i ' (2), • • • ,c[ ' \h — 1) are identically 
distributed yields 

E[ £(M)] = E[ £<M> (0)] + ^(\)E[C$> d) (l)}. (10) 

To determine E[C^' d \k)] (k = 0, 1), for i, j = 1, 2, ■ ■ • ,uq and I — 0, 1, let pfj(r) be the 
probability that, when constructing the network, a given stub with X- label I and total 
degree quantile i is paired with a stub having total degree quantile j. Then, p[°j — 1/uq 
and 

pf)(r) = /*•>■ ifr> °' 
M Ui,n Q +i-j if r < 0, 
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where d>y = 1 if i = j and 5 it j = if i ^ j. Further, for d — 1, 2, • • • , j = 1, 2, • • ■ , rig and 
Z = 0, 1, let pjj(r) be the probability that a stub chosen uniformly from all stubs having 

total degree d and X-label I is paired with a stub from quantile j. Then pfj{r) = 1/uq 
and 



Pd]i r ) = ^2PQ\D( i \ d )P < Q( r ) 



Consider the individual labelled 0, i.e. the primary case, in the above single-household 
epidemic. This individual has total degree d and resides in a household of size h, so it 
has d — h + 1 global neighbours, one of whom infected it. Thus the individual has d — h 
global edges along which it can spread the epidemic. Each of the corresponding stubs 
independently has X-label 1 with probability |r|, so 

E [C^(0)] = (d- h) Pl [(l - \r\)n-J + |r|pg], (11) 

where pi = 1 — </>j(A) is the unconditional probability that a given infective infects a given 
susceptible neighbour. 

Now consider the individual labelled 1 in the single-household epidemic and suppose that 
it becomes infected. The global degree of individual 1 is distributed according to G. Thus, 
for g — 1, 2, • • • , with probability p g , individual 1 has g global neighbours and hence total 
degree g + h—1. Each of these g global neighbours is infected with probability pj and the 
X-labels of the corresponding outgoing stubs from individual 1 are independent Bernoulli 
random variables with success probability |r|. Summing over g and taking expectations 
yields 

oo 

E[ £(M) (1)] = J2p 9 9Pi[(1 - \r\)n£ + \r\pf +h _ hj {r)]. (12) 



Note that if g — then individual 1 has no global neighbour to infect. Note also that 
E[Cy l,< ^(l)] is independent of both d and i, as indeed is the distribution of c\ ' \l) 
Combining (15]), (TTUjh (fll]) and (fI2]) gives 



rriij 



oo a s 

PiE^IQWE^M ( d ~ h )Pi (1 - kD^Q 1 + \r\Pij 
d=l h=l { 



1 - IrDE^riQ 1 + \r\ J^PadPglh-i 

9=1 



(13) 



To summarise, equation (fl3|) defines the elements of the mean matrix M = [rhij] of 
the branching process Hp. The dominant eigenvalue of M, denoted by R*, determines 
whether or not a major outbreak is possible, as described at the beginning of the section. 



4.1.3 Offspring PGFs and major outbreak probability 

We now derive the offspring PGFs for the multitype branching processes 23^ and Hf, 
which enable their extinction probabilities (and hence the probability of a major outbreak) 
to be determined. Observe that if the infectious periods are not constant, i.e. there does 
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not exist t > such that P(J = i) — 1, then the infectious periods of individuals infected 
by a single-household epidemic are not independent of the final size of that epidemic, 
which complicates, for example, using the decomposition (Q to determine the offspring 
PGFs of 'Bp. As in Ball et al. (2010), it is possible to use the theory of final state random 
variables developed in Ball and O'Neill (1999) to obtain expressions for these offspring 
PGFs in terms of Gontcharoff polynomials, though the details are rather involved and we 
do not present them here. Instead, we consider the special case of a constant infection 
period, when the above-mentioned difficulties do not arise. Thus in this subsection, but 
not elsewhere in Section HI we assume that / = i (i.e. P(J = t) = 1), so any given 
infective infects each of its neighbours (local or global) independently with probability 
pi = 1 — exp(— At). The epidemic model is then an extension of the standard Reed-Frost 
epidemic (see, for example, Andersson and Britton (2000), Chapter 1) to our network 
model. Note also that, in a physics setting, this Reed- Frost type model can be viewed as 
an extension, to incorporate degree correlation, of the bond percolation model of Gleeson 
(2009) for a class of clustered networks. Recall also that, as is well known for Reed-Frost 
type epidemics, the probability and the expected relative final size of a major outbreak 
are equal (cf. final paragraph of Section |4~2"|) . 

As noted previously, the forward branching process T>p has a different offspring distri- 
bution in the initial generation than in all subsequent generations. We consider first a 
non-initial generation. For i = 1,2, ••• ,Uq and s = (si,S2, ••• > s n Q ) with < Sj < 1 
{% = 1,2, ••• ,n Q ), let 

/e» = E n 



be the joint PGF of Cj. (Throughout the paper, for a vector random variable, Y = 
(Yi, Y 2 , • ■ ■ , Y nq ) say, we use fy (s) to denote its joint PGF.) Conditioning on the house- 
hold size and total degree of a typical type-i primary infective, as at (jHl), yields 

co d 
d=l h=l 

The decomposition (EJ may be expressed as 

T (h) 

C^ = C?' d \o) + J2c? 4 \k), (15) 

k=l 

where now cf"' d (1), c[ h '^(2), • ■ ■ , cf ,d (T^) give the offspring vectors for the sec- 
ondary cases in the single- household epidemic. Further, since the infectious period is 

constant, conditional upon the random vectors C\ (1), (2), • • ■ , C\ ' '(T^) 
are independent and identically distributed copies of a random vector whose distribution 
is independent of T^ h \ Hence, (|15p implies that 

f 6 (h,d){s) = / a (h,d) (0) (s)/ T (h) (7cf' d) (i)( s )) ' ( 16 ) 
where f TW (s) (0 < s < 1) is the PGF of T^ h \ which, using Ball (1986), Theorem 2.6, is 
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given by 



k=0 ^ ' 



a k {s){l- Pl ) k ^ (h = l,2,-..), 



(17) 



where a (s), «i(s), ■ • • are defined recursively by 



^Q(l-pif k - l Ws) = s~ k (A; = 0,1, 



(18) 



To complete the derivation of f^. (s), we obtain expressions for f-{h,d) (s) and f~(h,d) (s). 
Consider a typical type-z primary infective, i* say, and let j* be a susceptible global neigh- 
bour of i*. Let Xi — (Xiij X«2, ■ ' ' ) Xmg), where Xifc = 1 if i* infects j* and the edge between 
i* and j* was formed by connecting to a stub from j* belonging to quantile k, and Xik = 
otherwise. (Note that if i* does not infect j* then every element of Xi is zero, and if i* 
does infect j* then precisely one element of x% is one an d all other elements of Xi are 
zero.) For i = 1, 2, • • • , uq and s G [0, l] nQ , define the PGF of Xi 



E 



n 



3=1 



;i - |r|)-2- + |r|pij(r)sj 
n <9 



Then using a similar argument to the derivation of (FIT]) yields 



(g d (s)T 



(19) 



(20) 



Now consider a typical individual, i* say, infected by a single-household epidemic and 
suppose that i* has total degree d. Let j* be a susceptible global neighbour of i* and 
define Xd = (Xdi, Xdi, 4 4 4 , Xdn Q ) in the same way as ^ but with z* and j* replaced by i* 
and j*, respectively. Letting 



9d{s) 



E 



n 



6 i 



1 - Pi + Pi 



;i - \r\)^- + \r\$A{r)sj 



(21) 



a similar argument to the derivation of ( |T2l) yields 

oo 

/cf' d )(i)( S ) = (9g+h-l(s)) 9 . 



(22) 



3=0 



Combining (fT4j) . ffT6l) . (120]) and (122]) gives the PGF of the offspring random variable Cj 
for a typical type-z individual in "Bp. 

Consider now the initial generation of the forward branching process 23^. Since the initial 
infective, i* say, in the epidemic is not infected through the network, the ancestor in ¥>p 
is not typed according to its total degree. Let C = (Ci, C 2 , • • • , C nQ ) denote the offspring 
random variable for the ancestor in Hp. Then, conditioning on z*'s global degree and 
household size, 



fc(s) = ^2^2Pgn h f c (h, g +h-i)(s) 
3=0 h=l 



(23) 
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where, for h = 1, 2, ■ ■ • and d = h + 1, h + 2, • • • , C ( ' ' denotes the offspring random 
variable for the ancestor given that i* resides in a household of size h and has total degree 
d. Analogous to ( !T5|) . C^ h ' d ^ admits the decomposition 

T (h) 

C (h,d) = C (h,d) ( ) + J- C M {k) , (24) 
fe=i 

whence, as at ffT6l) , 

f c (h,d)(s) = f c (h,d)^(s)f T ( h ) (^fc( h ^)(i){ s )j ■ (25) 

Now C (M) (1) = C (M) (1), so f c(h , d){1) (s) is given by the right hand side of fl22J. Note 
that if i* has household size h and total degree d, then, since all of its d — h + 1 global 
neighbours are susceptible, its offspring distribution is the same as that of a secondary 
infective having total degree d in a single size-h household epidemic. Thus, 

fc^(o)( s ) = (9d-h+i(s)) d ~ h+1 ■ (26) 

The offspring PGF f c of the ancestor in ¥> F now follows using (123]h (j25), (122]) and (f26"|). 

We now determine the probability of a major outbreak. Suppose that i?* > 1. For 
i = 1,2, ••• ,tiq, let <Tj be the probability that the branching process S F goes extinct 
given that there is one ancestor whose type is i, and let a = (cr 1 ,<r 2 , • • • )°"n Q )- Then, 
(see, for example, Mode (1971), Section 1.7.1), cr is the unique solution in [0, l) nQ of the 
equations 

fc^)=^ (i = l,2,---,n Q ). (27) 

By conditioning on the number and type of offspring of the ancestor in "Bp, the probability 
that the branching process Hp survives (and hence the probability that a major outbreak 
occurs) is 

Prnaj = 1 - /c(<r). (28) 



4.2 Final outcome of a major outbreak 

We now consider the relative final size of a major outbreak. The main tool that we use 
is the susceptibility set (Ball (2000), Ball and Lyne (2001) and Ball and Neal (2002)), 
which we now define. Label the n nodes (individuals) 1, 2, ■ ■ • , n. For i = 1, 2, • • ■ , n, 
by sampling from the infectious period distribution and the Poisson processes describing 
when % makes infectious contact with its neighbours, construct a (random) list of who % 
would have infectious contact with if % was to become infected. Then construct a directed 
random graph, with nodes 1,2, •• • ,n, in which for any pair of nodes with i ^ j, 

there is a directed edge from i to j if and only if j is in i's list. For % — 1, 2, • • • , n, the 
susceptibility set of node i is set of all nodes j from which there is a chain of directed 
edges to i (including i itself). 

Observe that a node, i say, is ultimately infected by the epidemic if and only if the initial 
infective belongs to z's susceptibility set. Suppose that the population size n is large. 
Then, as with the early stages of the epidemic, we can approximate the susceptibility set 



16 



of a node, i* say, chosen uniformly at random from the population by a households-based 
multitype branching process. We first consider z*'s local susceptibility set, i.e. the set of 
nodes in z*'s household from which there is a chain of within-household directed edges to 
i* (including i* itself). We next consider each member, j* say, of z*'s local susceptibility 
set and determine which of j*'s global neighbours have a directed edge joining them to 
j*. The set of all such global neighbours of z*'s household form the first generation of 
the (backward) approximating branching process, with each such global neighbour, k* 
say, (generation- 1 individual in the branching process) being typed by the quantile of 
the corresponding stub from k*. The process is then repeated in the obvious fashion to 
obtain the second generation of the backward branching process, and so on. Denote this 
branching process by ¥>b- As with the forward branching process, the offspring law of !B# 
is different in the initial generation from that of all subsequent generations. Let H>b be 
the multitype branching process describing the descendants of a typical first-generation 
individual in 23 

We conjecture that, subject to mild conditions on the household size and global degree 
distributions, the expected relative final size of a major outbreak converges to the survival 
probability of 23 # as n — > oo. This is proved formally in Ball et al. (2009) for the model 
with constant household size and no global degree correlation (i.e. r = 0); however, the 
proof in Ball et al. (2009) is long and we do not attempt here to adapt it to the present 
model. Further, assuming the conjecture is true, the argument in Ball et al. (2012) can be 
used to show that the relative final size of a major outbreak converges in probability to 
the survival probability of 23 # as n — > oo. The proof in Ball et al. (2012) is also quite long 
and we do not attempt to adapt it to the present model. The numerical illustrations in 
Section [6] (see Figure [2] and the surrounding commentary) support the above conjecture. 

We determine now the offspring PGFs for 23 b and 23 g. We do not assume that the 
infectious periods are constant. Let B = (Bi, B 2 , • • • , B nq ) denote the offspring random 
variable for the ancestor in and, for i = 1, 2, • • • , uq, let Bi = (Bn, B i2 , • • • , B iriQ ) 
denote the offspring random variable for a typical type-z individual in 23#. 

Consider B t first. Let k* be as above and assume it has type i. Then arguing as at (|14p 
yields 

oo d 

/^(«) = E^i«( d i < )E/&^(*)' ( 29 ) 

d=l h=l 

where b\ ' ^ denotes the corresponding offspring random variable when k* belongs to 
a household of size h and has total degree d. Let M^ 1 ' + 1 denote the size of a typ- 
ical local susceptibility set in a household of size h. For I = 0,1, let B\ ' ; (Z) = 
(Bn(l) , Bi 2 (l) , ■ • • ,B inQ (l)), where -Bjj(O) is the number of type-j global neighbours of 
k* that would attempt to infect k* if they become infected and £%(1) is defined similarly 
but for any other member of k*'s local susceptibility set. Then, noting that infectious 
global neighbours of an individual make infectious contact with that individual indepen- 
dently, each with probability pj, 
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where, for d — 1, 2, • • • and ft = 1, 2, • • ■ , d + 1, 

oo 

/jj(^) (0) (s) = (g d (s)) d ~ h and /^M) ( i)0) = ^ (9 9 +h-i(s)) g 
and g«(s) and (ji(s) are defined by f|T9|) and f f2T]) . 

Turning to the PGF of .B, similar arguments to the above show that, in an obvious 
notation, 

oo oo 

/b(s) = ^^PgTChfBih.g+h-i) (0) (s) f M (h) ^/ B C>fl+ft-i)(l)(s)J , (30) 
9=0 h=l 

where, for d = 0, 1, • • • , and ft, = 1, 2, • • • , d + 1, 

oo 

f B (h,d) i0) = (g d - h+1 (s)) d ~ h+1 and f B (h, d){1) = (# 9+fe _i(s)) 9 . 

3=0 

The probability mass function (and hence the PGF) of may be determined using 
the following result (see Ball and Neal (2002), Lemma 3.1). For ft — 2, 3, • ■ • , 

P(AfW =k)=r l ~ iy j M( k + l)X) h ~ l ' k P(M^ =k-l) (k = 0, 1, • • • , ft - 1), 
where 

E (J I 1) 0/(^) fc -'P(M« = / - 1) = 1 (* = 1, 2, • • • ). 

It is readily shown that E[AfW] = E[T^] (ft = 1,2, ■ • • ) , see Lemma 1 in the appendix 
of Ball et al. (1997), using which it follows that ¥>b and "Bp have the same offspring 
mean matrix. Thus the branching process H>b survives if and only if > 1. For 
i = 1,2, ••• ,hq, let £j be the probability that the branching process goes extinct 
given that there is one ancestor whose type is i, and let £ = (£1,^2, • • • >£n Q )- Then, if 
R* > 1, £ is the unique solution in [0, l) nQ of the equations 

/„.(£') 6 (< = l,2,...,no) 

and, for n suitably large, the relative final size of a major outbreak, z say, is given 
approximately by 

* = l-/fl(*)- (31) 

There does not appear to exist a similar recursive expression for the PGF f M {h){s) to that 
for f T (h)(s) given by (fT7|) and ([TBI , except when the infectious period is constant. In this 
case and have the same distribution, from which it easily follows (using the 

PGF formulae in the preceding sections) that p ma j = z. 
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5 Epidemics on rewired networks 



5.1 Properties of epidemics 

We now extend the results of the previous section to the model in which the edges in a 
fraction p^w of households are rewired. 

Suppose first that p^w = 1, so all household edges are rewired. The early stages of an 
epidemic in the rewired network may be approximated by a multitype branching process as 
in Section Hm except now a local epidemic is the spread of disease along red edges alone, 
each having the same household size label. Such local epidemics are realisations of the 
acquaintance model studied by Diekmann et al. (1998) and a special case of a standard SIR 
epidemic on a configuration-model random network, see, for example, Newman (2002b). 
Note that, if n is large, the graph of red edges in the rewired network is locally tree-like. 
For h = 2, 3, • • • , let denote an SIR epidemic, with one initial infective, on a tree 
in which each node has degree h — 1, with infectious period distributed according to / 
and infection rate A. Then for large n, a local epidemic in the rewired process may be 
approximated by £^ and all the results of Sections 14. II and 14.21 continue to hold provided 
the single-household final size and susceptibility set random variables and are 
replaced by their corresponding rewired counterparts defined on £W, which we denote 
by and M^ h ' . As usual, the approximation of a local epidemic by £W can be made 
exact in the limit as n — > oo via a coupling argument. 

Each individual in households of size 2 have precisely one red stub, so when the corre- 
sponding red stubs are paired up such individuals are partitioned into households of size 
2 as before, whence = and = M^ 2 \ Fix h > 2 and consider a typical local 
epidemic 8.( h >. The initial infective in has h — 1 susceptible neighbours, while any 
subsequent infective in the local epidemic has h — 2 susceptible neighbours. Any given 
infective infects any given susceptible neighbour with probability pi = 1 — 0/(A). Thus 
in the (single-type) branching process, "3p say, which gives the size of successive gener- 
ations of infectives in £W, the ancestor has offspring mean (h — l)pi and all subsequent 
individuals have offspring mean (h — 2)pi, whence 

(h - i) Pl [i - (h - 2) PI }- 1 i£pi<^, (32) 

OO ifpi>fcZ2- 

Suppose now that I = l, so any infective in E,^ infects each of its neighbours inde- 
pendently with probability pi. Then the offspring distribution of the ancestor in 25^ is 
Bin(/i — l,pi) and the offspring distribution of any subsequent individual is Bm(h — 2,pi), 
where Bin(n,p) denotes a binomial distribution having n trials and success probability p. 
Standard branching process arguments then yield that, for h — 1, 2, • • • , 

f fw (s)= (l-pi+Pif^is))^ 1 (0<s<l), (33) 

where f^ h \s) is the unique solution in [0, 1] of the equation 

f {h Xs)=s(i-p I +p I f( h \s)y~\ 



/x w (A) = E[T 
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cf. equations (17) and (18) of Newman(2002b); note that f {h) (s) is the PGF of the total 
progeny of a typical non-ancestor in feffi. 

Consider now the branching process, fey^ say, that describes on a generation basis a typical 
local susceptibility set associated with 8,^ and return to the case of a general infectious 
period distribution. It is easily seen that the offspring distributions of the ancestor and 
any subsequent individual in fey* are Bin(/i — l,pi) and Bin(/i — 2, pi), respectively, where 
Pi — (fiiW, so /m(«( s ) is given by the right hand side of (|33|) . 

Finally we consider the case when the rewiring probability p^w £ (0,1)- Then, for 
example, the size (prw ) of a typical local epidemic corresponding to households having 
size h is distributed according to T^ h \ with probability prw, and to TW, with probability 
l-p RW . Thus, E[TW{p RW )] = (1 -p RW )^{\) +PRW^ h) WJ T w {PRW )(s) = (l 1 

PRw)f T w{s) +PR\vffw( s ) and fuW(p RW )( s ) = ( x -?»)/#)(*) +Pw/#)(s). The 
threshold parameter i?*, probability of a major epidemic p ma j and relative final size of a 
major outbreak z now follow by appropriate substitution into the results in Sections 14.11 
and IS 

5.2 Effect of rewiring 

We now examine the qualitative effect of rewiring on the probability and relative final 
size of a major outbreak. For the model with r = 0, constant infectious period and fixed 
household size (i.e. P(H = h) = 1 for some h), Gleeson et al. (2010) use an analytic 
argument to show that the bond percolation threshold (the value of pi so that i?* = 1) 
is larger for the model with full rewiring (prw = 1) than for the model with no rewiring 
{Prw — 0). Miller (2009) proves a similar result, again using an analytic argument, for 
an alternative model of random clustered networks, involving triangles, and also shows 
that the relative final size z of a major outbreak is smaller for the fully rewired network 
than for the corresponding model without rewiring. Here we employ a coupling argument, 
similar to that in, for example, Mollison (1977) and Ball (1983), to prove that for our 
model -R*,p ma j and z are all increasing functions of the rewiring probability Prw- The 
coupling argument is both intuitive and powerful. It may be extended to the model of 
Gleeson et al. (2010), without the restriction of a common household size, to the models 
of Miller (2009) and Newman (2009), and to the extension of the latter model proposed 
by Karrer and Newman (2010) that incorporates more general subgraphs than triangles. 

For h = 1, 2, • • • , let £ (/l) denote the single size-/z household epidemic introduced in Sec- 
tion 14X21 so is the final size of not including the initial infective. For fixed 
h > 2, a realisation of 8,( h \ viewed in generations of infectives, may be constructed from 
a realisation of fep as follows. The ancestor of fe^ corresponds to the initial infective in 
£(» xhe number of individuals, Z\ say, in the first generation in fey (i.e. the offspring 
of the ancestor) give the number of people directly infected by the initial infective in E^ h \ 
The individuals so infected are obtained by sampling Z\ individuals uniformly at random 
without replacement from the h — 1 individuals in the household excluding the initial 
infective. The sampled individuals form the first generation of infectives in We now 
consider each first-generation individual in the branching process fep in turn. The imme- 
diate offspring of such a first-generation individual give the number of people with which 
the corresponding infective in £^ makes infectious contact. The people so contacted are 
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obtained by sampling uniformly at random without replacement from the h — 1 individuals 
in the household excluding the infective under consideration. It is possible that a person 
so contacted has already been infected in E^ h \ in which case the corresponding birth in 
23^ and all of the descendants of that individual in "Bp are ignored in the construction 
of £w. The construction of continues in the obvious fashion and terminates when 
there is no infective remaining in the household. 

Observe that by construction the size of the epidemic 8,^ is not larger than that the 

st st 

total progeny of the branching process 2>^, so > T^ h \ where > denotes stochastic 
ordering, whence jl^ h '(X) > ^ h '(X) and ff(h){s) < frw(s) (0 < s < 1). Moreover, 
provided A/if > 0, these inequalities are strict for all h > 3 and all s e [0, 1). It follows 
that, if all other parameters are held fixed, the threshold parameter R* is an increasing 
function of the rewiring probability Prw, as is the probability of a major outbreak p ma j 
(assuming that the infectious period is constant). When the infectious period is not 
constant, the above coupling can be extended to include the global degrees of individuals 
in such a way that infectives in the household epidemic £W have the same global degree 
and make the same global infectious contacts as the corresponding individuals in the 
branching process , from which it follows that p ma j is increasing in Prw- Moreover, if 
P(H > 3) > and Xfij > then both i?* and p ma j are strictly increasing in Prw- 

Turning to the final outcome of a major outbreak, for fixed h > 2, we can construct 
a realisation of the local susceptibility set say, of an individual, i* say, who resides 
in a household of size h, from a realisation of the branching process B^ as follows. 
The local susceptibility set of i* is constructed on a generation basis. The ancestor of 
25 ^ corresponds to the individual i*. The first generation of 25 ^ gives the number of 
individuals in z*'s household who would make infectious contact with i* if they were to 
become infected; who these individuals (who form the first generation of §w) are is then 
determined by sampling without replacement as above. We next consider in turn each 
member, j* say, of the first generation of and determine which of those individuals 
not currently in would join the susceptibility set of i* by virtue of making infectious 
contact with j*. Suppose that j* is the kth first-generation member of to be considered 
in this fashion. Then any individual not currently in has failed to infect k individuals, 
so the probability that it fails to infect j* is given by PF{k) = 4>i((k + l)A)/0/(fcA). 
Moreover, since such individuals are distinct, they each fail to infect j* independently 
with probability pp{k). Let Pf(0) = <Ar(A). We now prove that, as one would expect 
on intuitive grounds, for any A > 0, PF(k) > Pf(0) (k = 1, 2, • • • ), with strict inequality 
unless I = l for some i > 0. 

Define the function rj by r](6) = log <pi{6) (8 > 0). Then rj is a convex function, since 
(pi is a moment generating function, and 77(0) = 0. Thus, rj(X) < -j^r]((k + 1)A) and 
fj{k\) < j^rj{(k + 1)A), whence 

r](X)+rj(kX)<r}((k + l)X), (34) 

which implies that PF{k) > Pf{0) (k = 1,2, •••). Moreover, if the infectious period 
random variable I is not almost surely constant then rj is a strictly convex function, so, 
provided A > 0, the inequality in (|34|) is strict and Pf(^) > Pf(0) (k — 1, 2, • • • ). 

In view of the above result, the individuals who join the susceptibility set by virtue 
of making infectious contact with j* may be determined as follows. Let Zj* be the num- 
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ber of immediate offspring of the individual in that corresponds to j* and note 
that Zj* ~ Bin(/i — 2, 1 — pp(0)). Given Zj*, sample Zj* from the binomial distribution 

Bin yZj*, xIp^[o) j an d then sample Zj* individuals uniformly at random without replace- 
ment from the h — 1 individuals in the household excluding j*. Any individual in this 
latter sample that is not currently in is added to S^. This process is repeated for 
all j* belonging to the first generation of S^', thus yielding the second generation of §>( h \ 
and so on. Observe that, by construction, any individual in §w has a corresponding 

individual in B^, so AfW > AfW, whence < f M w(s) (0 < s < 1), with strict 

inequality for h > 3 and < s < 1 provided Xfii > 0. It follows that the relative final size 
z of a major outbreak is increasing in the rewiring probability Prw-, and strictly increasing 
if P(H > 3) > and X/ij > 0. 



6 Numerical examples 

In this section we explore some properties of our network epidemic model numerically 
We restrict our attention to the Reed-Frost type version of our model, i.e. we assume that 
I = i for some i > 0, which implies that p ma j = z, and rather than dealing explicitly with 
I and the contact rate A we refer to the marginal infection probability pi = 1 — exp(At). 
Also, we use the notation Poi and Poi + for global degree and household size distributions, 
as in Section 13.51 

First we briefly investigate the convergence of p ma j and z for finite populations (derived 
empirically from simulations) to the asymptotic values (derived analytically) as the num- 
ber of nodes/individuals n becomes large. Figure [2] shows this behaviour in p ma j and 
z, for fixed G, H, rig, pi and varying r G [—1,1], comparing the asymptotic results to 
empirical estimates from networks of size n = 1,000 and 10,000 nodes/individuals. Each 
empirical estimate of a quantity of interest is based on no = 1, 000 simulations and is 
represented by an approximate 95.4% confidence interval, calculated as a point estimate 
± 2 standard errors (SE). (Also note that each simulation consists of generating a network 
then running an epidemic on it; we do not just run 1,000 epidemics on a single randomly 
generated network.) Each point estimate of p ma j is simply the proportion p of simulations 
that took off into a major outbreak (the cutoff between minor and major outbreaks being 
determined by inspecting histograms of epidemic final size), and SE = (p(l — p) /no) 1 ! 2 . 
The point estimate of z is the mean fraction of the population ultimately infected by a 
major outbreak and here SE = cm^ 2 , where a 2 is the sample variance of the fraction 
of the population ultimately infected by a major outbreak and ri\ is the number of sim- 
ulations that resulted in a major outbreak. As was explained in the closing sentences 
of Section 5 of Ball et al. (2009), our simulation methods yield much tighter confidence 
bands for z than for p ma j since each simulation effectively gives a single realisation of the 
epidemic process but each simulation that results in a major outbreak gives n — 1 (highly 
correlated) realisations of the susceptibility set process. 

We see that for networks with only 1000 nodes the asymptotic values of p ma j seem to 
be very good approximations to the empirically calculated major outbreak probabilities 
across all values of r. The expected relative final size also seems to be well approximated 
by the asymptotic values even for n = 1,000; though there does appear to be some 
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Figure 2: Plots comparing empirical estimates (n < oo) and asymptotic values (n — > oo) 
of Pmaj and z, as a function of r, for our model with degree distributions H ~ Poi + (2) and 
G ~ Poi(8) (c = 0.04) and H ~ Poi + (4) and G ~ Poi(6) (c = 0.16). Other parameters 
are uq = 10 and pi = 0.2. Empirical estimates are for network sizes n — 1, 000 and 
n = 10, 000, each estimate being based on 1,000 simulations. Note that the scales on the 
vertical axis on these plots is very variable. 
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Figure 3: Plot of p ma j versus r for varying values of p\. G ~ Poi(10 — //) and H ~ Poi + (/i), 
with \i taking the values, in order, 0.1, 2, 4, 6; corresponding to clustering coefficients 
10~ 4 , 0.04, 0.16, 0.36. Note also that the pi values used are the same in each plot except 
for the smallest value, which is chosen so that the epidemic is just supercritical for all 
values of r. 

bias, which is more pronounced for more extreme values of r. One explanation for this 
is that when r is close to —1 or 1, there are more imperfections in the random graph 
(self-loops, household self-loops, etc.) and so the branching process approximation breaks 
down sooner. Nevertheless, the z plots lend considerable credence to our conjecture in 
Section 14.21 that the expected relative final size of a major outbreak converges to the 
survival probability of 25 ^ as n — > oo. 

Having seen that our asymptotic results give reasonable descriptions of the behaviour of 
our epidemic model on a moderately sized finite network, we turn our attention to investi- 
gating the effect of some of the parameters of our model on its (asymptotic) behaviour. We 
focus initially on the qualitative behaviour of p m aj(= z) considered as a function of r (and 
Pi). Figure E] illustrates this behaviour in the case where G ~ Poi(10 — //), H ~ Poi + (/i), 
so D ~ Poi(10), and uq = 10, for various values of /i G [0, 10) (and therefore c = (/x/10) 2 ). 

We see a variety of patterns in the dependance of p ma j on r as pj and c are varied. Broadly, 
when the process is well above criticality the dependance is not very strong, but when 
the process is only just supercritical changes in r in particular (and thus in the degree 
correlation) can have a substantial impact on the epidemic model. The interesting (and 
somewhat unexpected) qualitative behaviour observed in the p = 0.105 line in plot (a) 
is explored in further detail in Figure HI Note, however, that the model parameters that 
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Figure 4: Plot of p ma j versus r for near-critical values of pj, when G ~ Poi(9.9), H ~ 
Poi + (0.1) and uq = 10. (Note that the pi = 0.103 line is positive near r = —1.) 

give rise to this behaviour are pc = 9.9 and p H = 0.1, so there is essentially no clustering 
in the network; clearly further work is required to determine whether the model behaves 
in such a way with other, more realistic parameter values. Nevertheless, the wide range 
of values of p ma j(= z) for different values of r (i.e. degree correlation) are observed near 
criticality in all of the plots in Figure [3j even though the non-monotonicity is only observed 
in plot (a). 

Finally, Figure [5] illustrates the effect on p ma j(= z) of changing c, keeping r and pi fixed, 
for the case when the total degree D ~ Poi(10) and riq = 10. The degree correlation 
p is held fixed at p = 0.2 and, for the unrewired model, the clustering coefficient c is 
tuned to be any value in its feasible range (see Figured]) by varying p and using ([7]). The 
maximum value of c, consistent with p = 0.2, is c = 0.4855, which is attained when r = — 1 
and p = 6.9676. For the rewired model, the clustering coefficient is tuned by taking the 
unrewired model with r = — 1 and p = 6.9676 and letting the rewiring probability prw 
vary in [0, 1]. Figure |5] shows how p m aj(= z) varies with c for both the unrewired and 
rewired models. Note that, as one might expect, p m aj(= z) decreases with c for both 
models; indeed this is proved formally for the rewired model in Section 15.21 Note also 
that p ma j(= z) is different for the two models, illustrating that these epidemic properties 
depend on more than just the local properties of the network encapsulated in (D, c, r). 

7 Discussion 

In this paper we define a network model which allows for quite arbitrary clustering c, 
degree correlation p and degree distribution D, and asymptotic features of the model 
are derived. The main focus is on analysing an epidemic model on the network, and in 
particular what effect various network properties have on the epidemic in terms of its 
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Figure 5: Plot of p ma j(= z) versus c when D ~ Poi(10),p = 0.2, uq = 10 and pi = 0.15. 

threshold parameter R*, the probability p ma j of a major outbreak, and the relative size 
z of a major outbreak. The main conclusion is that all three quantities R*, p ma j and z 
are decreasing with the clustering coefficient c (when rewiring edges in the network thus 
keeping everything else fixed), whereas the dependence on the degree correlation p is not 
as easily expressed: the quantities may be either increasing or decreasing depending on 
which part of the parameter space is being investigated. To our knowledge this is the first 
network model having such general features for which the properties of an epidemic are 
analysed in this level of detail. 

A disadvantage with the model is that, in general, there is no simple and explicit relation 
between the model parameters H, G, r, and tiq and the more interesting network proper- 
ties c, p and D. Note however the relation for D given in equation (JTJ, and the facts that 
p is increasing with r and c is increasing in H (in the sense that c(G, Hi, r) > c(G, Hi, r) 

St 

if Hi > H2), keeping other parameters fixed. A model having simpler relationships to the 
local network properties could be more easily interpreted and would hence be of interest. 
The use of appropriate pairing of stubs to control degree correlation, as done in this paper, 
could be applied to other models of clustered networks, such as those in Newman (2009), 
Miller (2009) and Karrer and Newman (2010). 

It is important to observe that, as illustrated in Figure |5l there may be distinct network 
models having the same local network features D, p and c but still giving different proper- 
ties of an epidemic, the latter being a global property. In applications it is hence important 
to fit not only local properties of a network model to empirical network data, but also to 
study the definitions of the model and try to understand if the model mechanism seems 
to agree realistically with how the empirical network may have been constructed. 
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and by the Swedish Research Council (TB). 
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Appendix: Derivation of degree correlation p 



In the appendix we derive the formula for the degree correlation p for our model given 
in equation (Q. Let E denote an edge chosen uniformly at random from all edges in the 
network, and let Xl and X R denote the total degrees of the nodes adjacent to E. Then 
p = cott(Xl, Xq), i.e. the correlation between Xl and X R . Let I G = 1 if E is a global 
edge and I G = if E is a household edge, so P(I G = 1) = Pg = 1 — P(^g — 0). We 
determine first the probability p G that E is a global edge. 

Let Nq and Nh denote respectively the number of global and household edges in the 
network. Then = j^g, since each stub contributes to half an edge, and p^ H = \p^_ x , 
since the household size of an individual chosen unifomly at random from the population 
is distributed according to H and if such an individual resides in a household of size h 
it has h — 1 household neighbours. Letting n — > oo and using the strong law of large 
numbers shows that pg is given by (j3J). 

Note that 

cov(X L ,X R ) = E[cov(X L ,X R \I G )]+cov(E[X L \I G },E[X R \I G }). (35) 

We calculate the two quantities on the right hand side of ( 1351) in turn. 

Suppose that I G = 0, so E is a household edge. Then X L = H E — 1 + Gl and X R = 
H E — 1 + G R , where H E is the size of the household that contains the edge E, and Gl 
and G R are the global degrees of the nodes adjacent to E. Observe that H E is distributed 
as H and, since Ig = 0, Gl and G R are independent copies of G. Thus, 

cov{X L: X R \I G = V>) = 4. (36) 



Suppose that Ig = 1, so i? is a global edge. Let Qz, and be the total degree quantiles 
of the two stubs used to form the edge E. Then, for i, j = 1, 2, • • • , n Q , 

+ ifr>0, 

P(QL = i,Q R = j)= |r| - f ^ n (37) 

[-^ i + *i,n +i-iV ifr<0. 

Now, 

cov(X L ,X R \I G = 1) = E[cov(X L ,X fi |/ G = l,Q L ,Q fl )] 

+ cov(E[X L |/ G = 1, Q L ], E[X R \I G = 1, Q R ]). (38) 

Given (Ql,Qr), the total degrees X L and X R are independent, so 

cov(X L , X R \I G = 1, Q L , Qr) = 0. (39) 

Further, for i = 1,2,- • • ,n Q , E[X L |/ G = 1,Q L = i] = E[X L \I G = l,Q R = i] = (see 
equation (jSJ)). Using the distribution f )37|) and noting that /z^, = rig 1 Y^=i^ yields 

cov(E[X L |/ G = 1,Q L ],E[X R \I G = 1,Q R }) = g^ nQ {r), (40) 
where 9f) tnQ (r) is defined at (BJ. 
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Note that P(I G = 1) = p G = 1 - P(/ G = 0). Then, equations Q3g), (JM}, (EHD and (HDD 
yield 

E[cov(X L ,X R \I G )] = (l-p G )a%+p G g^ nQ (r). (41) 

We turn now to the second quantity on the right hand side of (135]) . Note that E[Xl|/ g ] = 
E[X R \I G ], so cov(E[X L \I G ],E[X R \I G }) = vw(E[X L \I G }). Suppose that I G = 0. Then, in 

the above notation, X L = H E — 1 + Gl, where Gl = G. Thus, 

E[X L \I G = 0] =/x^_ 1 + MG . (42) 

Suppose that J G = 1. Then X^ = D and recall that D = H — 1 + G. Thus, 

E[X L \I G = l]= f i s _ 1 + f j, e . (43) 



Recalling that P(I G = 1) = p G = 1 - P(I G = 0) and that /j, g = E[G 2 }/fi G , equations (T42 
and yield 



2 \ 2 

cov(E[X L |/ G ],E[X R |/ G ])=p G (l-p G )(^-^-^) . (44) 



Combining equations ( 1351) . ( 1411 and ( l44j) gives 

cov(X L ,X i? ) = (1 -p G )o\ +PG9D,n Q ( r ) +PgO- ~Pg) (^h- m - ^ 



(45) 



We now derive var(X^). First note that 

var(X L ) = E[var(X L |/ G )] + vax{E[X L \I G \). (46) 

As above, if I G = then X L = H E — 1 + Gl, where H E = H and G E = G are independent, 
so var(X i |/ G = 0) = a 2 ^ + a G ; and if J G = 1 then X L = H — 1 + (5, where H and G are 
independent, so var(Xi|/ G = 1) = + <7j|. Hence, 

E[var(X L |/ G )] = (1 - p G ) (a| + a G ) + p G (4 + a%) , 

which on substituting into (146]) . recalling that var(E[X^| J G ]) = cov(E[Xi|/ G ], E[X R |/ G ]) 
and using (]4"4"]) yields 

/ 2 \ 2 

var(X L ) = (1 - p G ) (o% + a%) + p G (a\ + aj) + p G {l - p G ) \u A - fi 6 - g J . (47) 

The expression ([6]) for the degree correlation p, given in Section 13.3] follows from equa- 
tions (145]) and (117]) . since var(X^) = var(X#). 
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